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In  many  investigations  it  is  necessary  for  the  engineer  or  hydrologist  to  estimate  the 
frequency  with  which  certain  hydrologic  events  will  be  equaled  or  exceeded.  This  is  done  not 
only  in  the  design  of  reservoirs,  floodways,  and  similar  projects,  but  also  in  the  economic 
evaluation  of  flood  protection  projects  and  the  design  of  irrigation  systems.  Frequency  studies 
are  based  on  past  records,  when  available,  or  on  synthetic  procedures,  where  there  are  no 
records  from  an  area  under  investigation.  Because  of  the  short  length  of  time  hydrologic  data 
have  been  gathered,  it  is  necessary  to  use  a  method  of  analysis  that  will  extrapolate  beyond  the 
period  of  record.  There  are  several  ways  in  which  this  can  be  done.  Beard  (2)z  states: 

"Estimates  of  the  frequencies  with  which  various  magnitudes  of  rainfall  or  stream-flow 
values  will  be  exceeded  in  the  future  is  based  on  a  study  of  the  frequencies  with  which  they  were 
exceeded  in  the  past.  Ordinary  records,  however,  consist  of  a  relatively  few  events,  and  con- 
sequently are  a  poor  guide  to  future  possibilities.  An  adequate  frequency  analysis  usually  will 
have  to  draw  on  information  other  than  that  contained  in  the  record,  either  by  using  general 
accumulated  knowledge  (judgment)  or  by  the  combination  of  available  information  by  means  of 
statistical  processes.  Graphically,  frequencies  are  evaluated  simply  by  arranging  observed 
values  in  the  order  of  magnitude  and  considering  that  a  smooth  curve  suggested  by  that  array 
of  values  is  representative  of  future  possibilities.  In  the  application  of  statistical  procedures, 
the  concept  of  theoretical  distributions  is  employed.  A  distribution  is  a  set  of  values  that  would 
occur  under  fixed  conditions  in  an  infinite  amount  of  time.  Those  that  have  occurred  are  pre- 
sumed to  constitute  a  random  sample  and  accordingly  are  used  to  make  particular  inferences 
regarding  their  'parent  distribution'  (i.e.  the  distribution  from  which  they  were  derived).  Such 
inferences  are  necessarily  attended  by  considerable  uncertainty,  because  any  given  set  of 
observations  could  result  from  any  of  many  sets  of  physical  conditions  (from  any  one  of  many 
distributions).  By  statistical  process,  the  most  probable  nature  of  the  distribution  from  which 
the  data  were  derived  can  be  evaluated.  Furthermore,  the  ranges  within  which  the  characteristics 
of  the  true  distribution  lie,  corresponding  to  any  specified  degree  of  reliability,  can  be  evaluated. 
Once  a  distribution  is  established,  it  is  a  simple  and  straightforward  process  to  calculate  re- 
quired probabilities.  The  probability  of  exceeding  a  specified  value  in  a  single  event  is  equal  to 
the  proportion  of  events  in  the  parent  distribution  that  are  greater  than  that  value." 

From  this  explanation  it  can  be  seen  that  statistical  procedures  lend  themselves  to  fre- 
quency analysis  if  two  assumptions  are  made:  -(1)  The  recorded  data  fit  into  a  theoretical  dis- 
tribution;   and    (2)   the    recorded   data   constitute    a    random  sample  of  their  parent  population. 

These  assumptions  have  been  made,  although  there  are  some  differences  of  opinion  regard- 
ing the  type  of  theoretical  distribution  into  which  hydrologic  data  should  be  placed.  Most  authori- 
ties agree,  however,  on  one  of  the  following  distributions:  (1)  The  logarithmic-normal  (6)  (8); 
(2)  the  extreme  value  (4)  (7);  (3)  the  Poisson  (£);  or  (4)  the  gamma  (1)  (5). 


1  Hydraulic  Engineer,  Northwest  Branch,  Soil  and  Water  Conservation  Research  Division,  Agricultural  Re- 
search Service,  U.S.  Department  of  Agriculture. 

2  Underscored  numbers  in  parentheses  refer  to  Literature  Cited  at  end  of  publication. 


The  selection  of  one  of  these  methods  is  a  matter  of  preference  of  the  individual  or  organi- 
zation making  the  analysis.  This  discussion  is  confined  to  the  log-normal  or  so-called  "Hazen" 
method. 

There  are  two  basic  procedures  used  to  determine  Hazen  frequency  lines:  by  analytical 
or  by  graphical  procedures.  This  report  presents  each  of  these  methods  and  discusses  the 
advantages  and  disadvantages  of  each.  There  is  very  little  work  here  that  is  original  with  the 
writer,  but  he  will  discuss  the  lines  of  reasoning  followed  in  the  use  of  each  method. 


HAZEN  COMPUTING  METHOD 

The  Hazen  method  is  based  on  the  logarithmic-normal  distribution  theory.  This  means  that 
the  logarithms  of  a  series  of  hydrologic  observations  are  normally  distributed.  The  frequency 
relationship  following  this  theory  approximates  a  straight  line  on  log-normal  graph  paper.  In 
certain  instances  there  is  not  a  normal  distribution  but  rather  a  skewed  distribution,  which  will 
be  represented  by  a  curved  line  of  log-normal  paper.  Skewness  can  result  from  any  or  all  of 
the  following  reasons: 

1.  A  poor  sample  that  is  not  truly  representative  of  the  parent  population.  This  condition 
is  most  likely  to  occur  when  the  data  are  collected  during  a  relatively  short  period,  which  in- 
cludes years  of  severe  drought  or  years  of  excessive  rainfall  and  runoff. 

2.  Consistent  bias  in  the  data.   Bias  can  be  introduced  in  many  ways.  Some  examples  are: 

(a)  A  nonrecording  rain  gage  that  leaks,  or  readings  by  an  inept  observer  who  does  not 
record  the  smaller  rains. 

(b)  A  stream  that  has  either  natural  or  artificial  regulation  above  the  point  of  measure- 
ment. Examples  of  this  are  the  sand-bottom  "losing"  streams  of  the  Southwest  and 
streams  that  have  dams. 

(£)  Introduction  of  agricultural,  municipal,  or  industrial  waste  water  to  a  stream  above 
the  point  of  measurement  will  cause  all  recorded  flows  to  be  higher  than  the  natural 
flow  of  the  stream. 

3.  The  population  does  not  fit  a  log-normal  distribution. 

The  basic  assumption  in  the  Hazen  method  is  that  hydrologic  data  do  fit  a  log-normal  dis- 
tribution. A  great  number  of  data  are  required  to  prove  otherwise  statistically,  and  examples  1 
and  2  above  should  be  considered  very  carefully  before  reaching  conclusion  3. 

In  any  event,  skewed  distribution  should  be  considered  as  a  special  case  in  frequency  analy- 
sis, and  in  this  report  a  log-normal  distribution  will  be  assumed. 

In  the  computation  of  a  Hazen  frequency  line,  standard  statistical  procedures  are  used  to 
calculate  the  mean  and  the  standard  deviation  of  the  logarithms  of  a  set  of  hydrologic  data. 
These  values  will  plot  as  a  straight  line  on  log-normal  paper  and  will  represent  the  theoretically 
most  probable  frequency  relationship  for  the  data  used.  An  example  of  the  computation  method 
is  shown  in  table  1. 


TABLE  1. — Computation  of  Hazen  frequency  line 
(Wyoooche  River  at  Oxbow,  near  Aberdeen,  Wash.) 


Discharge  data 

Year 

Annual  peak 

discharge 

(c.f.s.) 

Log.  peak 
discharge 

Method  of  computation 

1926 

8,350 

3.9217 

1. 

Number  of  items  =  N  =  27. 

1927 

7,790 

3.8915 

2. 

Sum  of  X's=  S(X)  =     108.0982. 

1928 

9,910 

3.9961 

1929 

5,560 

3.7451 

3. 

Mean-^^-   X    -  4.0036. 
N 

1930 

7,870 

3.8960 

1931 

11,000 

4.0414 

4. 

Sum  of  squares  =  S(X2)  =    433.1625. 

1932 

12,900 

4.1106 

5. 

Squared  sum=  [S(XJ]  2  =  11,685.2208. 

1933 

9,230 

3.9652 

1934 

14,000 

4.1461 

6. 

Correction  for  sum  of  squares  -    ISfX) } 

N 

432.7860. 

1935 

18,000 

4.^553 

= 

1936 

6,830 

3.8344 

1937 

8,830 

3.9460 

7. 

Sum  of  squared  deviations  =  S(X2) 
[S(X)]2  =   S(d2)=  433.1625-432.7860 

1938 

11,100 

4.0453 

N 

1939 

13,300 

4.1293 

= 

0.3765. 

1940 

11,400 

4.0569 

8. 

Vflrianr.P   __•    S(d2)  =    /c2 ,    -     0.3765 

1941 

10,600 

4.0253 

/V-i                           26 

1942 

9,270 
10,000 

3.9671 
4.0000 

9. 

0.0145. 

1943 

Standard  deviation  =  V«2     =  (s)  =  V  0.0145 

1944 

9,510 

3.9782 

=. 

0.1205. 

1945 

12,800 

4.1072 

10. 

Mean  plus  (s)    -    4.1241. 

1946 

6,820 

3.8338 

11. 

Mean  minus   (s)    -  '3.8831. 

1947 

9,750 

3.9890 

1948 

10,700 

4.0294 

12. 

Antilog  mean  =  antilog  4.0036  =  10,010. 

1949 

8,630 

3.9360 

13. 

Antilog  mean  plus  (s)  -    antilog  4.1241 

1950 

16,400 

4.2148 

~ 

13,300. 

1951 

14,200 

4.1523 

14. 

Antilog  mean  minus  (s)  =    antilog  3.8831 

1952 

7,660 

3.8842 

~ 

7,640. 

The  main  advantages  in  the  Hazen  computing  method  are: 

(1)  Since  it  is  strictly  a  computational  process  and  no  personal  judgment  is  involved,  two  or 
more  individuals  working  with  the  same  data  will  produce  identical  frequency  lines. 

(2)  Statistical  computations  lend  themselves  to  mathematical  evaluation  of  the  reliability 
of  the  results. 

Some  disadvantages  of  the  method  are: 

(1)  Each  item  in  a  group  of  data  is  given  equal  weight  in  constructing  the  frequency  line. 
This  means  that  one  or  two  items  that  have  a  much  larger  variation  from  the  mean  than  all 
other  items  could  have  a  strong  influence  on  both  the  position  and  the  slope  of  the  line. 

(2)  Computation  of  the  line  makes  interpretation  of  the  results  more  difficult  than  if  the 
individual  items  in  a  group  of  data  are  plotted  and  made  available  for  visual  study. 

(3)  This  method  is  fairly  slow  because  one  must  look  up  the  logarithm  of  each  item  in  a 
group  of  data  from  a  table  and  then  make  the  computations.  Errors  in  mathematics  are  easily 
made  and  difficult  to  discover  without  going  through  the  entire  process  twice  as  a  check. 

Because  of  these  disadvantages,  the  computing  method  is  generally  not  used,  and  the  plotting 
method  is  preferred  by  most  technicians.  However,  in  some  cases  a  combination  of  the  two 
methods  is  used  and  is  considered  to  be  a  good  practice. 


HAZEN  PLOTTING  METHOD 


The  second  method  that  is  used  to  construct  Hazen  frequency  lines  is  that  of  defining  the 
line  by  plotting  individual  items  of  data  from  an  array  at  selected  "plotting  positions."  This  is 
analogous  to  the  procedure  whereby  stream  discharge  measurements  are  plotted  against  stage 
measurements  to  define  a  theoretical  stage-discharge  rating  curve  for  a  control  section  of  a 
stream.  The  difference  in  the  two  procedures  lies  in  the  fact  that  in  the  case  of  the  stream 
measurements  both  the  stage  and  the  discharge  can  be  measured,  whereas  in  the  case  of  the 
frequency  line  it  is  difficult  to  measure  the  frequency  with  which  any  given  event  is  equaled  or 
exceeded.  The  problem  then  becomes  one  of  developing  a  method  of  finding  "plotting  positions" 
in  terms  of  frequency  or  percent  chance  of  occurrence.  Various  ways  of  doing  this  are  used 
that  require  an  equation  or  other  systematic  procedure. 


Methods  Used  To  Determine  Plotting  Positions 

Four  methods  of  determining  plotting  positions  are  presented  below,  along  with  the  reason- 
ing that  was  followed  in  developing  each.  No  attempt  is  made  to  recommend  one  procedure  over 
another,  but  each  individual  may  decide  which  method  has  the  best  logical-statistical  backing. 

1.  In  the  first,  and  most  basic,  method itis  reasoned  that  since  the  largest  event  in  a  period 
of  X  years  has  been  equaled  or  exceeded  once,  it  should  be  assigned  a  frequency  of  X.  years.  In 
other  words,  the  largest  event  recorded  in  a  20-year  period  should  have  a  frequency  (recurrence 
interval)  of  20  years.  This  can  be  expressed  in  the  following  equation: 

F         =         JL,  (la) 

n 


where  F=  recurrence  interval  or  frequency,  Y=  number  of  years  of  record,  and  _n=  order  num- 
ber in  an  array.  A  different  form  of  this  same  equation  can  be  used  to  find  the  plotting  position: 

p  --    ^  (ib, 

where  P=  plotting  position  or  percent  chance  of  occurrence.  With  regard  to  this  method  Beard  (3) 
states: 

"This  reasoning  is  not  rigorous,  because  it  can  similarly  be  reasoned  that  all  events  have  been 
equal    to   or    smaller   than    the    largest   recorded,    giving  it  an  exceedence  interval  of  infinity." 

A  study  of  this  statement  shows  that  perhaps  method  1  is  not  the  most  logical  procedure  to  use. 

2.  The  second  method  for  finding  plotting  positions  is  used  quite  extensively  by  hydro lo gists. 
Beard  (3)  explains: 

"The  second  line  of  reasoning  is  that  the  largest  event  is  representative  of  largest  events 
of  other  similar  periods,  i.e.,  that  half  of  the  20-year  periods  at  a  location  will  have  maximum 
events  smaller  than  the  one  in  question.  This  appears  to  be  quite  reasonable  (accepting  the 
inevitable  assumption  that  the  record  is  a  random  selection)  and  results  in  the  conclusion  that 
the  largest  event  of  a  20-year  period  will  be  exceeded  in  only  half  of  all  20-year  periods." 

If  this  reasoning  is  followed  to  its  logical  conclusion,  the  largest  event  experienced  in  a 
period  of  Y.  years  will  have  a  recurrence  interval,  on  the  average,  equal  to  2Y.  The  equation 
used  to  find  the  recurrence  interval  by  this  second  method  is: 

2Y 
F      =       -2^-L  <2a> 


and  to  find  plotting  positions: 


100  f2n-     I) 

— jy <2b> 


The  reasoning  that  is  followed  in  arriving  at  equations  2a  and  2b  neglects  the  fact  that  the 
largest  event  of  the  period  under  study  will  be  exceeded  more  than  one  time  in  some  of  the 
other  periods  of  equal  length.  Therefore,  its  frequency  should  actually  be  somewhat  less  than  2Y. 

The    discussion   of   the   first   two   methods    shows   that  any  method  of  determining  plotting 
positions  (or  frequency)  should  result  in  the  following: 

(a)  The  largest  event  of  a  period  of  record  should  have  a  recurrence  interval  of  something 
greater  than  Y. 

(b)  The  largest  event  should  have  a  recurrence  interval  less  than  2Y. 

(c)  The    plotted    points    should    closely    follow   the   computed  line.  Different  methods  and 
equations  can  be  used  to  accomplish  these  three  objectives. 

3.    One  equation  that  is  in  common  use  is 

F    -    Ljla  (3a) 

n 
or  its  other  form, 

100  n 
P         -         Y    +     1     *  <3b) 


Whereas  equation  3b  will  stand  the  test  of  the  three  rules  listed  above  under  (a),  (b),  (c),  there 
is  some  question  as  to  its  applicability  because  of  the  strong  similarity  to  equation  lb.  It  seems 
that  if  the  largest  event  should  have  a  frequency  greater  than  Y,  it  should  also  have  a  frequency 
greater  than  (Y+  1).  However,  equation  3b  is  used  and  accepted. 

4.  The  last  method  is  one  of  compromise.  In  it  the  assumption  is  made  that  if  the  largest 
event  of  a  period  of  record  should  have  a  frequency  greater  than  Y  and  less  than  2Y,  then  it 
should  be  half  way  between  the  two  at    j  Y .     This  is  expressed  by  the  equation: 

F      =       nOY-A)    -     <y-2)  (4a) 

and  the  plotting  positions  are  found  by 

p      _        100  b  (3  ^4)     -     100  (Y-  2) 

P     -  ;3y  (Y_d  <4b> 

Apparently  Beard  followed  this  line  of  reasoning,  since  the  plotting  positions  found  in  the  tables 
of  reports  (2)  and  (3)  are  the  same  as  the  plotting  positions  computed  from  equation  4b. 


Advantages  and  Disadvantages 

As  with  the  computing  method,  there  are  also  advantages  and  disadvantages  to  the  plotting 
method.  Probably  the  main  disadvantage  lies  in  the  fact  that  personal  judgment  enters  into  the 
positioning  of  the  frequency  line  after  the  points  are  plotted.  The  procedure  is  to  first  get  the 
general  slope  of  the  line  from  the  plotted  points  and  then  draw  the  line  with  some  of  the  points 
on  the  line  and  about  half  the  remaining  points  above  and  the  other  half  below  the  line.  In  doing 
this  the  points  that  plot  far  off  the  line  are  given  less  weight  than  the  points  that  follow  the  line 
more  closely.  This  can  cause  a  situation  where  two  or  more  individuals  can  arrive  at  different 
results  even  though  all  used  the  same  basic  data.  This  is  more  likely  to  happen  with  short 
periods  of  record  than  with  long  ones. 

The  major  advantages  of  the  plotting  method  are: 

(1)  Each  individual  item  of  data  in  the  array  is  made  available  for  visual  study. 

(2)  The  points  are  available  for  visually  estimating  the  quality  of  the  records.  Even  though 
the  individual  measurements  of  each  item  in  the  array  may  be  at  a  high  order  of  accuracy,  the 
distribution  of  the  values  within  the  sample  may  be  such  that  the  sample  as  a  whole  is  of  poor 
quality.  If  the  plotted  points  scatter  a  great  deal  or  have  pronounced  curvature,  then  further 
investigations  should  be  made  before  attempting  to  interpret  the  results. 

(3)  The  method  is  straightforward  and  does  not  require  a  great  deal  of  statistical  training 
or  the  use  of  an  automatic  calculator,  which  is  almost  a  necessity  with  the  computing  method. 


Comparison  of  Results 

The  four  methods  of  finding  plotting  positions  are  demonstrated  in  figures  1,  2,  3,  and  4. 
In  each  case  the  line  was  computed  to  be  the  statistically  most  probable  frequency  line,  as  shown 
in  table  1,  and  the  points  were  plotted  using  the  various  equations,  as  shown  in  table  2.  The 
four  equations  can  be  compared  visually  by  assuming  that  the  computed  line  is  accurate  and 
then  observing  how  closely  the  plotted  points  follow  the  line. 
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TABLE  2. — Data  for  construction  of  Hazen  frequency  line  by  plotting 
(Wynoochee  River  at  Oxbow,  near  Aberdeen,  Wash.) 


Peak 
discharge 

Plotting  positions 

n. 

Equation 

Equation 

Equation 

Equation 

(c.f.s.) 

lb 

2b 

3b 

4b 

1 

18,000 

3.7 

1.9 

3.6 

2.5 

2 

16,400 

7.4 

5.6 

7.1 

6.1 

3 

14,200 

11.1 

9.3 

10.7 

9.8 

4 

14,000 

14.8 

13.0 

14.4 

13.4 

5 

13,300 

18.5 

16.7 

17.9 

17.1 

6 

12,900 

22.2 

20.4 

21.4 

20.8 

7 

12,800 

25.9 

24.1 

25.0 

24.4 

8 

11,400 

29.6 

27.8 

28.6 

28.1 

9 

11,100 

33.3 

31.5 

32.1 

31.8 

10 

11,000 

37.0 

35.2 

35.7 

35.4 

11 

10,700 

40.7 

38.9 

39.3 

39.1 

12 

10,600 

44.4 

42.6 

42.9 

42.7 

13 

10,000 

48.2 

46.3 

46.4 

46.4 

14 

9,910 

51.9 

50.0 

50.0 

50.0 

15 

9,750 

55.6 

53.7 

53.6 

53.7 

16 

9,510 

59.3 

57.4 

57.1 

57.4 

17 

9,270 

63.0 

61.1 

60.7 

61.0 

18 

9,230 

66.7 

64.8 

64.3 

64.7 

19 

8,830 

70.4 

68.5 

67.9 

68.4 

20 

8,630 

74.1 

72.2 

71.4 

72.0 

21 

8,350 

77.8 

75.9 

75.0 

75.7 

22 

7,870 

81.5 

79.6 

78.6 

79.3 

23 

7,790 

85.2 

83.3 

82.1 

83.0 

24 

7,660 

88.9 

87.0 

85.7 

86.6 

25 

6,830 

92.6 

90.7 

89.3 

90.2 

26 

6,820 

96.3 

94.4 

92.9 

93.8 

27 

5,560 

100.0 

98.1 

96.4 

97.5 

From  a  study  of  these  four  frequency  lines,  it  is  obvious  that  figure  1,  which  used  equation 
lb,  is  the  least  accurate  of  the  four  methods.  The  remaining  three  figures  compare  quite  favor- 
ably; the  points  on  figure  2  and  figure  4  define  the  line  slightly  better  at  the  extremes  than 
figure  3.  This  could  be  caused  by  the  data  that  were  used.  If  a  different  sample  had  been  taken, 
it  is  possible  that  equation  3b,  which  was  used  in  figure  3,  would  have  produced  better  results. 
The  visual  comparison  is  confirmed  by  a  Chi-square  test,  which  indicates  no  significant  dif- 
ference between  the  four  equations.  It  should  be  noted  that  the  greatest  differences  between  the 
methods  are  at  the  extremes  where  precise  location  is  difficult  because  of  the  large  errors 
possible  in  the  measurement  of  the  discharge. 
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CONCLUSIONS 

In  a  study  of  the  information  presented  here,  the  emphasis  should  not  be  placed  on  the 
differences  between  the  methods  of  finding  plotting  positions  but  rather  on  the  similarity  of 
them.  If  a  good  sample  is  available,  each  method,  especially  the  last  three,  will  produce  results 
that  are  quite  close.  It  is  unfortunate,  but  true,  that  no  method,  either  plotting  or  computing, 
will  produce  a  good,  accurate  frequency  curve  from  a  poor  sample. 

The  one  simple  conclusion  reached  is  that  it  is  not  the  manner  in  which  plotting  positions 
are  found  that  determines  whether  a  frequency  line  is  good  or  bad.  Instead  it  is  the  quality  of 
the  sample  itself.  If  the  sample  is  good  and  is  a  true  representation  of  the  parent  population, 
then  any  of  the  last  three  methods  discussed  will  give  a  good  frequency  line.  If,  however,  the 
sample  is  a  poor  one,  then  further  investigations  should  be  made  in  an  attempt  to  adjust  or  im- 
prove the  quality  of  the  sample,  or  to  supplement  it.  For  if  the  data  are  used  "as  is",  a  poor 
frequency  line  will  probably  be  the  result. 
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